An Eecient Uniform-cost Normalized Edit Distance Algorithm
نویسنده
چکیده
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of edit operations which are of three types: insertion, deletion, and substitution of symbols. The model assumes a given weight function which assigns a non-negative real cost to each of these edit operations. The amortized weight for a given edit sequence is the ratio of the total weight of the sequence to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance between X and Y. Experimental results suggest that for some applications normalized edit distance is better as a similarity measure than ordinary edit distance, which ignores the length of the sequences. Existing algorithms for the computation of normalized edit distance with provable bounds on the complexity require O(mn 2) operations in the worst-case. In this paper we develop an O(mn log n) worst-case algorithm for the normalized edit distance problem for uniform weights. In this case the weight of each edit operation is constant within the same type, except substitutions can have diierent costs depending on whether they are matching or non-matching.
منابع مشابه
An Efficient Uniform-Cost Normalized Edit Distance Algorithm
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of i...
متن کاملEfficient Algorithms For Normalized Edit Distance
ÖMER EGECIOGLU2, Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. E-mail: [email protected] ABSTRACT: A common model for computing the similarity of two stringsX and Y of lengthsm and n respectively, withm n, is to transformX into Y through a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, a...
متن کاملComputation of Normalized Edit Distance and Applications
Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W ( P ) / L ( P ) , where P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L ( P ) is the number of these operations (length of P). In this paper, it is shown that in general, d ( X , Y ) canno...
متن کاملParallel algorithms for fast computation of normalized edit distances
We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem. The normalized edit distance between two strings X and Y with lengths n m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal proposed a sequential algorith...
متن کاملTowards Normalizing the Edit Distance Using a Genetic Algorithms-Based Scheme
The normalized edit distance is one of the distances derived from the edit distance. It is useful in some applications because it takes into account the lengths of the two strings compared. The normalized edit distance is not defined in terms of edit operations but rather in terms of the edit path. In this paper we propose a new derivative of the edit distance that also takes into consideration...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999